North and South: Danish voters and Syrian refugees
Task 1: Get spatial data for municipalities in Denmark
You can download administrative data for Denmark from the GADM
dataset, the Global administrative boundaries, hosted by UCDavis. You do
this by using the getData() function in the raster package.
For GADM data, you need to specify what level of admin boundaries you
wish to download (0=country, 1=first level subdivision aka regions,
2=second level aka municipalities, etc.). Read this blog
on the power of raster package when it comes to available
datasets.
Instructions:
- Load the boundaries of Danish municipalities from data/ folder, convert to simple feature and transform to CRS 25832.
- Sort the NAME_2 field to see how the Danish municipalities are spelled. You may need to change them later for the spatial data to join the attributes.
# Load the spatial data, project to UTM
mun_sp<- readRDS(________) # it's the gadm_... 2.rds dataset
mun_sf <- ________(mun_sp)
mun <- st_transform(_____, crs = ____)
# Plot so as to check correct location and complete coverage
# Check the names
# Straighten the names (return here after Task 2)Simple feature collection with 99 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 NAME_2
36829 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.1_1 Albertslund
36665 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.2_1 Allerød
36779 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.3_1 Ballerup
37011 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.4_1 Bornholm
36901 DNK Denmark DNK.1_1 Hovedstaden <NA> DNK.1.5_1 Brøndby
VARNAME_2 NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2 HASC_2
36829 <NA> <NA> Kommune Municipality <NA> DK.HS.AB
36665 <NA> <NA> Kommune Municipality <NA> DK.HS.AL
36779 <NA> <NA> Kommune Municipality <NA> DK.HS.BA
37011 <NA> <NA> Kommune Municipality <NA> DK.HS.BO
36901 <NA> <NA> Kommune Municipality <NA> DK.HS.BR
geometry
36829 MULTIPOLYGON (((712057 6173...
36665 MULTIPOLYGON (((700891 6191...
36779 MULTIPOLYGON (((715156 6178...
37011 MULTIPOLYGON (((878103.7 61...
36901 MULTIPOLYGON (((716929 6168...
[ reached 'max' / getOption("max.print") -- omitted 5 rows ]
[1] "Albertslund" "Allerød" "Ballerup"
[4] "Bornholm" "Brøndby" "Christiansø"
[7] "Dragør" "Egedal" "Fredensborg"
[10] "Frederiksberg" "Frederikssund" "Furesø"
[13] "Gentofte" "Gladsaxe" "Glostrup"
[16] "Gribskov" "Halsnæs" "Helsingør"
[19] "Herlev" "Hillerød" "Høje Taastrup"
[22] "Hørsholm" "Hvidovre" "Ishøj"
[25] "København" "Lyngby-Taarbæk" "Rødovre"
[28] "Rudersdal" "Tårnby" "Vallensbæk"
[31] "Århus" "Favrskov" "Hedensted"
[34] "Herning" "Holstebro" "Horsens"
[37] "Ikast-Brande" "Lemvig" "Norddjurs"
[40] "Odder" "Randers" "Ringkøbing-Skjern"
[43] "Samsø" "Silkeborg" "Skanderborg"
[46] "Skive" "Struer" "Syddjurs"
[49] "Viborg" "Aalborg" "Brønderslev"
[52] "Frederikshavn" "Hjørring" "Jammerbugt"
[55] "Læsø" "Mariagerfjord" "Morsø"
[58] "Rebild" "Thisted" "Vesthimmerland"
[61] "Faxe" "Greve" "Guldborgsund"
[64] "Holbæk" "Kalundborg" "Køge"
[67] "Lejre" "Lolland" "Næstved"
[70] "Odsherred" "Ringsted" "Roskilde"
[73] "Slagelse" "Solrød" "Sorø"
[ reached getOption("max.print") -- omitted 24 entries ]
Task 2: Load voting data for 2011 and 2015 and inspect
In order to move on towards analysis, I have provided you with a summarized voting data for 5 biggest parties in 2011 and 2015 and 2019 by municipality. The columns list total votes per party, sum of the electorate and fraction that each party got in a given year.
- Create the
electionsobject from the data/elections.rds - Inspect it to ensure you understand what the columns contain
- Join the data to the municipality shapes
munby the shared name - Plot the Socialdemocratie fraction across Denmark in 2015 and check whether you got all the municipalies. Fix names if not.
- In which municipalities did the Social Democrats get the highest proportion of population in 2015?
# Load the summarized election data
elections_data <- readRDS("../data/elections.rds")
# Join the election data with municipality polygons
# Fix the missing counties
# Map fraction of Socialdemokratie in 2015 to see no counties are missing
elections %>%
________ %>%
filter(_________) %>% # A.Socialdemokratie
dplyr::select(______) %>%
mapview()
# Which municipalities are the biggest fans of Socialdemokratie?Simple feature collection with 99 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 99 × 3
# Groups: NAME_2 [99]
NAME_2 pct_vote2015 geometry
<chr> <dbl> <MULTIPOLYGON [m]>
1 Bornholm 56.3 (((878103.7 6112929, 878088.2 6112928, 878087 61129…
2 Morsø 55.8 (((473803.7 6286625, 473923.9 6286624, 473998 62866…
3 Christiansø 52.4 (((892608.6 6148064, 892610.4 6148044, 892618.5 614…
4 Aalborg 52.1 (((531664.6 6318606, 531667.6 6318604, 531732.2 631…
5 Thisted 52.0 (((471987.6 6284534, 472001.9 6284534, 472021.7 628…
6 Ballerup 51.7 (((715156 6178972, 714227 6179009, 713611 6178539, …
7 Odder 51.7 (((566871.2 6191988, 566871.7 6191957, 567173 61919…
8 Nyborg 51.1 (((613677.9 6130271, 613678.7 6130239, 613681.1 613…
9 Skive 50.7 (((482468.4 6275480, 482528.8 6275479, 482528.9 627…
10 Lolland 50.6 (((652471.1 6058085, 652470.1 6058116, 652398.5 605…
# ℹ 89 more rows
Task 3: Look at some of the data
Now that we have a well-structured, complete and spatial dataset,
let’s explore the political preference distribution in space with the
help of the lovely tmap library!
- Filter your elections data for Social Democrats and Danske
Folkeparti (Hint:
grepl()is a good start) - then feed the result into
tm_shape()andtm_polygons, faceting along the way by party. Since you have 2 parties, you should get two visuals. - repeat three times, changing the
tm_polygons()data frompct_vote2011topct_vote2019
# Let's map the two most popular parties, SD and Danske Folkeparti through time
library(tmap)
elections %>%
filter(grepl("^A|^O", Party)) %>%
tm_shape() + tm_facets("Party", ncol = 2) + tm_polygons("pct_vote2011", title = "Percentage of Votes \nin 2011")elections %>%
filter(grepl("^A|^O", Party)) %>%
tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2015", title = "Percentage of Votes \nin 2015")elections %>%
filter(grepl("^A|^O", Party)) %>%
tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2019", title = "Percentage of Votes \nin 2019")Task 4: Cartogram
As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.
One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.
The cartogram package contains functions for creating
cartograms. You give it a spatial data frame and the name of a column,
and you get back a similar data frame but with regions distorted so that
the region area is proportional to the column value of the regions.
You’ll also use the sf package for computing the areas of newly
generated regions with the st_area() function.
Instructions
The elections sf object should be already loaded in your
environment.
- Load the
cartogrampackage. - Filter out the Danske Folkeparti votes from your
electionsdataset, creating aDFobject - Plot total electorate over municipality area for year 2015 in the
DFdata. Deviation from a straight line shows the degree of misrepresentation. - Create a cartogram scaling to the
pct_vote2015column. - Check that the DF voter population is proportional to the area.
- Plot the
pct_vote2015percentage on the cartogram. Notice how some areas have relatively shrunk or grown.
# load library
library(cartogram)
# Filter out Danske Folkeparti
DF <- elections %>%
filter(grepl("^O", Party))
# Check the spread of votes and municipality area
plot(DF$pct_vote2015, st_area(DF, byid = TRUE), xlab = "Vote %", ylab = "Area (m2)",
main = "Dansk Folkeparti fraction per municipality area")
# Make a cartogram, scaling the area to the percentage of SD voters
DF2015 <- cartogram_cont(DF, "pct_vote2015")
# Check the linearity of the SD voters percentage per municipality plot
plot(DF2015$pct_vote2015, st_area(DF2015, byid = TRUE))Copacetic cartogram! Now try to rerun the cartogram for the Social Democrats in 2015 and create a visual for both parties’ turnout and total electorate in 2015.
library(cartogram)
# Let's look at Social Democrats in 2015
SD <- elections %>%
filter(grepl("^A", Party))
# Make a cartogram, scaling the area to the total number of votes cast in 2015
SD2015 <- cartogram_cont(SD, "sum2015")
# Now check the linearity of the total voters per municipality cartogram as
# opposed to the reality
plot(SD$sum2015, st_area(SD, byid = TRUE)) # reality
plot(SD2015$sum2015, st_area(SD2015, byid = TRUE)) # cartogram
# Make a adjusted map of the 2015 SD and DF voters
plot(SD2015$geometry, col = "pink", main = "% of Social Democrat votes across DK in 2015")
plot(DF2015$geometry, col = "lightblue", main = "% of Danske Folkeparti votes across DK in 2015")Task 5: Spatial autocorrelation test
If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø.
How much similarity and spatial dependence is there, really?
By similarity or positive correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and their attributes will be more similar than any two random municipalities. Such autocorrelation or spatial dependence can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.
The spdep package has functions for measures of spatial
autocorrelation, also known as spatial dependency. Computing these
measures first requires you to work out which regions are neighbors via
the poly2nb() function, short for “polygons to neighbors”.
The result is an object of class nb. Then you can compute
the test statistic and run a significance test on the null hypothesis of
no spatial correlation. The significance test can either be done by
Monte-Carlo or theoretical models.
In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.
Instructions I - defining neighbors
- Load the
electionsspatial dataset with attributes - Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
- Load the spdep library and create nb object of neighbors using queen adjacency
- Pass
electionstopoly2nb()to find the neighbors of each municipality polygon. Assign tonb. - Get the center points of each municipality by passing
electionstost_centroidand then tost_coordinates(). Assign tomun_centers. - Update the basic map of the DK municipalities by adding the
connections.
- In the second plot call pass
nbandmun_centers. - Also pass
add = TRUEto add to the existing plot rather than starting a new one.
- In the second plot call pass
# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)
length(st_is_valid(mun_sm$geometry))[1] 99
# Use the spdep package
library(spdep)
# Make neighbor list following queen adjacency
nb <- poly2nb(mun_sm$geometry)
nbNeighbour list object:
Number of regions: 99
Number of nonzero links: 344
Percentage nonzero weights: 3.509846
Average number of links: 3.474747
11 regions with no links:
4 6 10 43 55 57 63 68 79 84 89
16 disjoint connected subgraphs
# Get center points of each municipality
mun_centers <- st_coordinates(st_centroid(mun_sm$geometry))
# Show the connections
plot(mun_sm$geometry)
plot(nb, mun_centers, col = "red", add = TRUE)Instructions II - Moran’s I
Now that your neighbors are determined and centroids are computed, let’s continue with the Moran’s I statistic
- Create a subset with municipalities for
O.Danske Folkeparti - Feed the
pct_2011vector intomoran.test().moran.test()needs a weighted version of thenbobject which you get by callingnb2listw().- After you specify your neighbor
nbobject you should define the weightsstyle = "W". Here,style = "W"indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors. - You will need another argument in both spatial weights and at the
level of the test.
zero.policy= TRUEdeals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t includezero.policy= TRUE, you’ll get an error. - Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
- Inspect a map of
pct_2011. - Run another Moran I statistic test, this time on Social Democrats.
- Use 999 Monte-Carlo iterations via
moran.mc(). - The first two arguments are the same as for
moran.test(). - You also need to pass the argument
nsim = 999. - Note the p-value. Can you reject the null hypothesis this time?
- Use 999 Monte-Carlo iterations via
# Let's look at Danske Folkeparti in 2015
DF <- elections %>%
_____(____)
# Run a Moran I test test on 2015 DF vote
moran.test(DF$_________,
nb2listw(____, style = "W",zero.policy=TRUE),
zero.policy=TRUE)
# Run a Moran I test test on 2011 DF vote
moran.test(DF$________,
nb2listw(___, style = "W",zero.policy=TRUE),
zero.policy=TRUE)
# Do a Monte Carlo simulation to get a more reliable p-value
moran.mc(DF$_________,
________(____, zero.policy=TRUE),
zero.policy=TRUE, nsim = 999)
Moran I test under randomisation
data: DF$pct_vote2015
weights: nb2listw(nb, style = "W", zero.policy = TRUE)
n reduced by no-neighbour observations
Moran I statistic standard deviate = 6.6385, p-value = 1.584e-11
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.532756868 -0.011494253 0.006721352
Moran I test under randomisation
data: DF$pct_vote2011
weights: nb2listw(nb, style = "W", zero.policy = TRUE)
n reduced by no-neighbour observations
Moran I statistic standard deviate = 6.2079, p-value = 2.685e-10
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.499144047 -0.011494253 0.006766158
Monte-Carlo simulation of Moran I
data: DF$pct_vote2015
weights: nb2listw(nb, zero.policy = TRUE)
number of simulations + 1: 1000
statistic = 0.53276, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Marvelous Moran Testing! When I ran the examples, the p-value was around 1.584e-11 in 2015 and 2.685e-10 in 2011 Moran tests, showing significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.001, so I did find some significant spatial (positive) correlation.
Task 6: Different sorts of neighborhood: 100 to 50 km
Does the result hold if you use a different scale / neighborhood calculation?
Connect the nearest places (islands)
# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)
# Use the spdep package
library(spdep)
# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)
# Make neighbor list from neighbours at 100km distance
nb_100 <- dnearneigh(mun_centers, 0, 1e+05)
plot(mun_sm$geometry)
plot(nb_100, mun_centers, col = "red", add = TRUE)# Make neighbor list from neighbours at 50km distance
nb_50 <- dnearneigh(mun_centers, 0, 50000)
plot(mun_sm$geometry)
plot(nb_50, mun_centers, col = "blue", add = TRUE)
title(main = "Neighbours within 50 km distance")Task 7: Different sorts of neighbourhood: k neighbors
# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)
# Use the spdep package
library(spdep)
# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)
# Make neighbor list from k neighbours
k3 <- knearneigh(mun_centers, k = 3)
nb_k3 <- knn2nb(knearneigh(mun_centers, k = 3))
plot(mun_sm$geometry)
plot(nb_k3, mun_centers, col = "red", add = TRUE)
title(main = "3 nearest neighbours")# Make neighbor list from k neighbours
nb_k5 <- knn2nb(knearneigh(mun_centers, k = 5))
plot(mun_sm$geometry)
plot(nb_k5, mun_centers, col = "red", add = TRUE)
title(main = "5 nearest neighbours")Taks 8: Rerun Moran’s I and MC
Now let’s rerun Moran’s I and MC with different neighbour conceptions
# Run a Moran I test on Dansk Folkeparti votes in 2015 based on k neighbors
moran.test(DF$_____,
________(nb_k3, style = "W",zero.policy=TRUE),
zero.policy=TRUE)
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$_____,
________(nb_k3, s zero.policy=TRUE),
zero.policy=TRUE, nsim = 999)
# Run a Moran I test on Dansk Folkeparti votes in 2011 based on k neighbors
moran.test(DF$_____,
________(nb_k3, style = "W",zero.policy=TRUE),
zero.policy=TRUE)
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$___________,
________(nb_k3, zero.policy=TRUE),
zero.policy=TRUE, nsim = 999)